cynthia rudin
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Research Report > New Finding (0.46)
Using Noise to Infer Aspects of Simplicity Without Learning Zachery Boner 1 Harry Chen
Noise in data significantly influences decision-making in the data science process. In fact, it has been shown that noise in data generation processes leads practitioners to find simpler models. However, an open question still remains: what is the degree of model simplification we can expect under different noise levels? In this work, we address this question by investigating the relationship between the amount of noise and model simplicity across various hypothesis spaces, focusing on decision trees and linear models. We formally show that noise acts as an implicit regularizer for several different noise models. Furthermore, we prove that Rashomon sets (sets of near-optimal models) constructed with noisy data tend to contain simpler models than corresponding Rashomon sets with non-noisy data. Additionally, we show that noise expands the set of "good" features and consequently enlarges the set of models that use at least one good feature. Our work offers theoretical guarantees and practical insights for practitioners and policymakers on whether simple-yet-accurate machine learning models are likely to exist, based on knowledge of noise levels in the data generation process.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Wisconsin (0.04)
- North America > United States > Florida > Broward County (0.04)
- North America > Dominican Republic (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Government (1.00)
- Health & Medicine (0.93)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > Ohio (0.04)
- (5 more...)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
SORTeD Rashomon Sets of Sparse Decision Trees: Anytime Enumeration
Arslan, Elif, van der Linden, Jacobus G. M., Hoogendoorn, Serge, Rinaldi, Marco, Demirović, Emir
Sparse decision tree learning provides accurate and interpretable predictive models that are ideal for high-stakes applications by finding the single most accurate tree within a (soft) size limit. Rather than relying on a single "best" tree, Rashomon sets-trees with similar performance but varying structures-can be used to enhance variable importance analysis, enrich explanations, and enable users to choose simpler trees or those that satisfy stakeholder preferences (e.g., fairness) without hard-coding such criteria into the objective function. However, because finding the optimal tree is NP-hard, enumerating the Rashomon set is inherently challenging. Therefore, we introduce SORTD, a novel framework that improves scalability and enumerates trees in the Rashomon set in order of the objective value, thus offering anytime behavior. Our experiments show that SORTD reduces runtime by up to two orders of magnitude compared with the state of the art. Moreover, SORTD can compute Rashomon sets for any separable and totally ordered objective and supports post-evaluating the set using other separable (and partially ordered) objectives. Together, these advances make exploring Rashomon sets more practical in real-world applications.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- (2 more...)
- Health & Medicine (1.00)
- Education (0.92)
Using Noise to Infer Aspects of Simplicity Without Learning Zachery Boner 1 Harry Chen
Noise in data significantly influences decision-making in the data science process. In fact, it has been shown that noise in data generation processes leads practitioners to find simpler models. However, an open question still remains: what is the degree of model simplification we can expect under different noise levels? In this work, we address this question by investigating the relationship between the amount of noise and model simplicity across various hypothesis spaces, focusing on decision trees and linear models. We formally show that noise acts as an implicit regularizer for several different noise models. Furthermore, we prove that Rashomon sets (sets of near-optimal models) constructed with noisy data tend to contain simpler models than corresponding Rashomon sets with non-noisy data. Additionally, we show that noise expands the set of "good" features and consequently enlarges the set of models that use at least one good feature. Our work offers theoretical guarantees and practical insights for practitioners and policymakers on whether simple-yet-accurate machine learning models are likely to exist, based on knowledge of noise levels in the data generation process.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Wisconsin (0.04)
- North America > United States > Florida > Broward County (0.04)
- North America > Dominican Republic (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Government (1.00)
- Health & Medicine (0.93)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Research Report > New Finding (0.46)